LLM 25-Day Course - Day 18: Fine-Tuning Concepts and Strategies

Day 18: Fine-Tuning Concepts and Strategies

Fine-tuning is the process of adapting a pre-trained LLM to your data and use case. However, fine-tuning is not always the answer. Today we learn the criteria for deciding when and how to fine-tune.

Prompting vs Fine-Tuning Decision

Before fine-tuning, first check whether prompt engineering can solve the problem. Use the following criteria to decide.

# Decision tree expressed as code
decision_tree = {
    "Does the current model + prompt produce the desired quality?": {
        "Yes": "Fine-tuning not needed. Focus on prompt optimization",
        "No": {
            "Does Few-shot + RAG improve results?": {
                "Yes": "Build a RAG pipeline recommended",
                "No": {
                    "Is domain-specific knowledge needed?": {
                        "Yes": "Proceed with fine-tuning (LoRA/QLoRA recommended)",
                        "No": {
                            "Do you need to change output format/style?": {
                                "Yes": "Proceed with SFT (Supervised Fine-Tuning)",
                                "No": "Consider switching to a larger model",
                            }
                        }
                    }
                }
            }
        }
    }
}

# Decision criteria summary
criteria = [
    ("Prompting is sufficient", "General questions, simple format conversion, translation"),
    ("RAG is appropriate", "Up-to-date information needed, internal document-based Q&A"),
    ("Fine-tuning needed", "Domain-specific terminology, specific output style, consistent persona"),
]
for method, usecase in criteria:
    print(f"[{method}] {usecase}")

Full Fine-Tuning vs Parameter-Efficient Fine-Tuning

Full Fine-Tuning updates all parameters. It is effective but requires enormous GPU memory. PEFT (Parameter-Efficient Fine-Tuning) trains only a small number of parameters to achieve similar results.

# Comparison table by fine-tuning method
comparison = {
    "Method":        ["Full FT",    "LoRA",    "QLoRA",   "Prompt Tuning"],
    "Trained params": ["100%",      "0.1~1%",  "0.1~1%",  "<0.01%"],
    "GPU memory":    ["7B=28GB+",  "7B=16GB", "7B=6GB",  "7B=16GB"],
    "Training speed": ["Slow",      "Medium",  "Medium",  "Fast"],
    "Performance":   ["Best",      "Very good","Good",    "Limited"],
    "Recommended":   ["Large budget","General", "GPU limited","Quick experiments"],
}

# Print table
for key, values in comparison.items():
    print(f"{key:16} | {'  |  '.join(values)}")

Data Requirements Guide

# Recommended data volume by fine-tuning task
data_requirements = {
    "Text classification":  {"min": "500",     "recommended": "2,000~5,000",   "note": "100+ per class"},
    "Sentiment analysis":   {"min": "1,000",   "recommended": "5,000~10,000",  "note": "Label balance important"},
    "Summarization":        {"min": "1,000 pairs", "recommended": "5,000~20,000 pairs", "note": "Source-summary pairs"},
    "Dialogue (chatbot)":   {"min": "500 convos",  "recommended": "3,000~10,000 convos", "note": "Include diverse scenarios"},
    "Domain adaptation":    {"min": "5,000",   "recommended": "10,000~50,000", "note": "Domain text"},
    "Code generation":      {"min": "2,000 pairs", "recommended": "10,000~50,000 pairs", "note": "Description-code pairs"},
}

for task, info in data_requirements.items():
    print(f"\n[{task}]")
    print(f"  Minimum: {info['min']}, Recommended: {info['recommended']}")
    print(f"  Note: {info['note']}")

Key principle: Quality matters more than quantity. Well-curated 1,000 samples are better than noisy 10,000 samples. Always manually review your data before fine-tuning.

Fine-Tuning Cost Estimation

For a 7B model, LoRA fine-tuning takes 1~~3 hours on a single A100. Cloud GPU costs (Lambda, RunPod, etc.) are around $1~~3 per hour, so expect approximately $5~10 per training run.

Today’s Exercises

Choose one of your real-world tasks, follow the decision tree above, determine whether prompting/RAG/fine-tuning is appropriate, and document the reasoning.
Research the parameter counts and model sizes (GB) of 7B, 13B, and 70B models on Hugging Face Hub, then estimate the GPU memory required for Full FT/LoRA/QLoRA for each.
Investigate how fine-tuning data can be collected in your domain (medical, legal, finance, etc.) and document at least 3 data sources.